New event detection and topic tracking in Turkish

نویسندگان

  • Fazli Can
  • Seyit Kocberber
  • Ozgur Baglioglu
  • Suleyman Kardas
  • Huseyin Cagdas Ocalan
  • Erkan Uyar
چکیده

Topic detection and tracking (TDT) applications aim to organize the temporally ordered stories of a news stream according to the events. Two major problems in TDT are new event detection (NED) and topic tracking (TT). These problems focus on finding the first stories of new events, and identifying all subsequent stories on a certain topic defined by a small number of sample stories. In this work, we introduce the first large-scale TDT test collection for Turkish and investigate the NED and TT problems in this language. We present our test collection construction approach which is inspired by the TDT research initiative. We show that in TDT for Turkish with some similarity measures, a simple word truncation stemming method can compete with a lemmatizer-based stemming approach. Our findings show that contrary to our earlier observations on Turkish information retrieval (IR), in NED word stopping has an impact on effectiveness. We demonstrate that the confidence scores of two different similarity measures can be combined in a straightforward manner for higher effectiveness. The influence of several similarity measures on effectiveness is also investigated. We show that it is possible to deploy TT applications in Turkish that can be used in operational settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Doppler and bearing tracking using fuzzy adaptive unscented Kalman filter

The topic of Doppler and Bearing Tracking (DBT) problem is to achieve a target trajectory using the Doppler and Bearing measurements. The difficulty of DBT problem comes from the nonlinearity terms exposed in the measurement equations. Several techniques were studied to deal with this topic, such as the unscented Kalman filter. Nevertheless, the performance of the filter depends directly on the...

متن کامل

Novelty detection for topic tracking

Multi-source web news portals provide various advantages such as richness in news content and an opportunity to follow developments from different perspectives. However, in such environments, news variety and quantity can have an overwhelming effect. New event detection and topic tracking studies address this problem. They examine news streams and organize stories according to their events; how...

متن کامل

Novelty Detection in Topic Tracking

NOVELTY DETECTION IN TOPIC TRACKING Cem Aksoy M.S. in Computer Engineering Supervisors Prof. Dr. Fazlı Can Asst. Prof. Dr. Seyit Koçberber July, 2010 News portals provide many services to the news consumers such as information retrieval, personalized information filtering, summarization and news clustering. Additionally, many news portals using multiple sources enable their users to evaluate de...

متن کامل

The Design of a Topic Tracking System

This paper describes research into the development of techniques to build effective Topic Tracking systems. Topic tracking involves tracking a given news event in a stream of news stories i.e. finding all subsequent stories in the news stream that discuss the given event. This research has grown out of the Topic Detection and Tracking (TDT) initiative sponsored by DARPA. The paper describes the...

متن کامل

Optimizing Story Link Detection is not Equivalent to Optimizing New Event Detection

Link detection has been regarded as a core technology for the Topic Detection and Tracking tasks of new event detection. In this paper we formulate story link detection and new event detection as information retrieval task and hypothesize on the impact of precision and recall on both systems. Motivated by these arguments, we introduce a number of new performance enhancing techniques including p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2010